modern hopfield network
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.24)
- North America > United States (0.14)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- (2 more...)
- Transportation > Passenger (0.67)
- Transportation > Ground > Road (0.67)
- Automobiles & Trucks > Manufacturer (0.67)
- North America > United States (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Austria > Vienna (0.14)
- (5 more...)
CLOOB: Modern Hopfield Networks with InfoLOOB Outperform CLIP
CLIP yielded impressive results on zero-shot transfer learning tasks and is considered as a foundation model like BERT or GPT3. CLIP vision models that have a rich representation are pre-trained using the InfoNCE objective and natural language supervision before they are fine-tuned on particular tasks. Though CLIP excels at zero-shot transfer learning, it suffers from an explaining away problem, that is, it focuses on one or few features, while neglecting other relevant features. This problem is caused by insufficiently extracting the covariance structure in the original multi-modal data. We suggest to use modern Hopfield networks to tackle the problem of explaining away. Their retrieved embeddings have an enriched covariance structure derived from co-occurrences of features in the stored embeddings.
- North America > United States (0.14)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- North America > Canada (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government (0.68)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
A Framework for Non-Linear Attention via Modern Hopfield Networks
In this work we propose an energy functional along the lines of Modern Hopfield Networks (MNH), the stationary points of which correspond to the attention due to Vaswani et al. [12], thus unifying both frameworks. The minima of this landscape form "context wells" - stable configurations that encapsulate the contextual relationships among tokens. A compelling picture emerges: across $n$ token embeddings an energy landscape is defined whose gradient corresponds to the attention computation. Non-linear attention mechanisms offer a means to enhance the capabilities of transformer models for various sequence modeling tasks by improving the model's understanding of complex relationships, learning of representations, and overall efficiency and performance. A rough analogy can be seen via cubic splines which offer a richer representation of non-linear data where a simpler linear model may be inadequate. This approach can be used for the introduction of non-linear heads in transformer based models such as BERT, [6], etc.
- North America > United States (0.15)
- North America > Canada > New Brunswick > Fredericton (0.04)
Rectified Lagrangian for Out-of-Distribution Detection in Modern Hopfield Networks
Moriai, Ryo, Inoue, Nakamasa, Tanaka, Masayuki, Kawakami, Rei, Ikehata, Satoshi, Sato, Ikuro
Modern Hopfield networks (MHNs) have recently gained significant attention in the field of artificial intelligence because they can store and retrieve a large set of patterns with an exponentially large memory capacity. A MHN is generally a dynamical system defined with Lagrangians of memory and feature neurons, where memories associated with in-distribution (ID) samples are represented by attractors in the feature space. One major problem in existing MHNs lies in managing out-of-distribution (OOD) samples because it was originally assumed that all samples are ID samples. To address this, we propose the rectified Lagrangian (RegLag), a new Lagrangian for memory neurons that explicitly incorporates an attractor for OOD samples in the dynamical system of MHNs. RecLag creates a trivial point attractor for any interaction matrix, enabling OOD detection by identifying samples that fall into this attractor as OOD. The interaction matrix is optimized so that the probability densities can be estimated to identify ID/OOD. We demonstrate the effectiveness of RecLag-based MHNs compared to energy-based OOD detection methods, including those using state-of-the-art Hop-field energies, across nine image datasets.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > Canada > Ontario > Toronto (0.14)